Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
نویسندگان
چکیده
We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction of 55.6% over a simple baseline; our best method for German achieves a 66.7% error reduction.
منابع مشابه
Morphological Annotation of the Lithuanian Corpus
As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morphological annotation of the 100 million word Lith...
متن کاملA Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources
In this paper, we describe a resource-light system for the automatic morphological analysis and tagging of Russian. We eschew the use of extensive resources (particularly, large annotated corpora and lexicons), exploiting instead (i) pre-existing annotated corpora of Czech; (ii) an unannotated corpus of Russian. We show that our approach has benefits, and present what we believe to be one of th...
متن کاملMorphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
We present new language resources for Moroccan and Sanaani Yemeni Arabic. The resources include corpora for each dialect which have been morphologically annotated, and morphological analyzers for each dialect which are derived from these corpora. These are the first sets of resources for Moroccan and Yemeni Arabic. The resources will be made available to the public.
متن کاملDo LSTMs really work so well for PoS tagging? - A replication study
A recent study by Plank et al. (2016) found that LSTM-based PoS taggers considerably improve over the current state-of-theart when evaluated on the corpora of the Universal Dependencies project that use a coarse-grained tagset. We replicate this study using a fresh collection of 27 corpora of 21 languages that are annotated with fine-grained tagsets of varying size. Our replication confirms the...
متن کاملA Cross-language Approach to Rapid Creation of New Morpho-syntactically Annotated Resources
We take a novel approach to rapid, low-cost development of morpho-syntactically annotated resources without using parallel corpora or bilingual lexicons. The overall research question is how to exploit language resources and properties to facilitate and automate the creation of morphologically annotated corpora for new languages. This portability issue is especially relevant to minority languag...
متن کامل